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Abstract 

We define a new approach to compilation to distributed architec- 
tures based on networks of abstract machines. Using it we can im- 
plement a generalised and fully transparent form of Remote Proce- 
dure Call that supports calling higher-order functions across node 
boundaries, without sending actual code. Our starting point is the 
classic Krivine machine, which implements reduction for untyped 
call-by-name PCF. We successively add the features that we need 
for distributed execution and show the correctness of each addi- 
tion. Then we construct a two-level operational semantics, where 
the high level is a network of communicating machines, and the low 
level is given by local machine transitions. Using these networks, 
we arrive at our final system, the Krivine Net. We show that Krivine 
Nets give a correct distributed implementation of the Krivine ma- 
chine, which preserves both termination and non-termination prop- 
erties. All the technical results have been formalised and proved 
correct in AGDA. We also implement a prototype compiler which 
we compare with previous distributing compilers based on Girard's 
Geometry of Interaction and on Game Semantics. 

Categories and Subject Descriptors D.3.1 [Programming Lan- 
guages]: Formal Definitions and Theory — semantics; D.3.2 [Pro- 
gramming Languages]: Language Classifications — concurrent, dis- 
tributed, and parallel languages; F.l.l [Computation by Abstract 
Devices]: models of computation 

Keywords abstract machines; distributed execution; simulation 
relation; Agda 

1. Seamless distribution 

There are two extreme views of programming languages. At one 
extreme we have the machine-oriented view, where the program- 
ming language is construed as the medium through which a pro- 
grammer instructs a computer to perform certain operations. The 
other extremal view is mathematical-logical in which the program- 
ming language is a medium of expressing abstract computational 
concepts such as algorithms or data structures. Historically, the first 
programming languages were, by necessity, machine-oriented, but 
algorithmic (i.e. mathematical-logical) machine independent lan- 
guages appeared soon after (FORTRAN, LISP, ALGOL, etc.). The 
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case for machine independent programming has been made, quite 
successfully, a long time ago. 

Machine independence means that programs can just be recom- 
piled to run on different devices with similar architectures. Archi- 
tecture independence pushes this idea even further to devices with 
different architectures, such as conventional CPUs, distributed sys- 
tems, GPUs and reconfigurable hardware. In a series of papers [12- 
14] we have examined the possibility of lifting machine indepen- 
dence to architecture independence in the context of distributed 
computing. The reason is that in distributed computing the deploy- 
ment of a program is often reflected at the level of source code. For 
example, these two ERLANG programs: 

c(A_pid)-> receive X -> A_pid ! X*X end, c(A_pid). 
mainO -> 

C_pid = spawn (f, c, [selfO]), C_pid ! 3, 
receive X -> C_pid ! 4, receive Y -> X+Y end 
end. 

and 

c()-> receive {Pid, X} -> Pid ! X*X end, c(). 
b(A_pid, C_pid)-> receive 

requestO -> C_pid ! {selfO, 3}, 

receive X -> A_pid ! X end; 
requestl -> C_pid ! {self 0 , 4}, 

receive X -> A_pid ! X end end, 
b(A_pid, C_pid) . 
mainO -> 

C_pid = spawn (f, c, [] ) , 

B_pid = spawn (f, b, [self 0 , C_pid] ) , 

B_pid ! requestO, receive X -> 

B_pid ! requestl, receive Y -> X+Y end end. 

perform the same function, which in a computation on a single node 
we may write as let f = Ax. x*xinf3 + f4, except 
that the deployment is different. The first program is distributed 
on two nodes whereas the second is distributed on three nodes. Our 
proposal is as follows: 

The program should not need to specify details of the 
runtime deployment pattern. 

This means that communication and process management should 
largely disappear from the source code, as they are the means 
by which deployment patterns are implemented. To explain our 
metaphor, they are the seam which we aim to eliminate. For ex- 
ample, the deployment pattern could be indicated using a configu- 
ration file or pragma-like code annotations assigning nodes to arbi- 
trary sub-terms of the program: 

let f = (Ax. x * x)@C in (f 3 + f 4)@B 
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We would like the invocations of f in the program to semanti- 
cally work exactly like a local function call even though the func- 
tion is located on the node C and invoked from node B. The compiler 
should automatically handle the communication for us by for trig- 
gering a small exchange of messages between the two nodes: For 
example, node B would start by sending a message to C, requesting 
the evaluation of f and providing the location of the argument and 
where to return to. 

At this early stage we make no claims that our approach 
is a practical alternative to established distributed programming 
methodologies, but we believe that there should be room for ex- 
ploring machine (or architecture) independent, purely algorithmic, 
languages in the context of distributed computing. Our approach 
is in contrast to communication-oriented idioms such as MPI and 
ERLANG. It is also philosophically different to domain specific 
languages for distributed computing, which expose communica- 
tion but use concepts associated with high-level programming lan- 
guages such as types [17] to avoid certain classes of errors. The 
closest to our approach are remote procedure calls, so our research 
programme can be reformulated as an answer to the question: 

Can remote procedure calls be incorporated transpar- 
ently, correctly and efficiently into the programming lan- 
guage? 

1.1 Contribution 

In this paper we present an extension of the classic Krivine abstract 
machine for the call-by-name lambda calculus [19] for seamless 
distributed computing. Our previous work gave a compilation tech- 
nique [13] based on the Geometry of Interaction (GOI) token ab- 
stract machine [23] and another one [14] based on game semantics 
(GAMC) [15]. They both achieve seamless distribution but have 
certain apparent unavoidable inefficiencies. The GOI-based compi- 
lation has the potential to be locally efficient but has a possibly in- 
surmountable communication overhead, whereas the games-based 
compiler communicates efficiently but requires very high computa- 
tional overhead on each node. The current approach, which we are 
also exploring for the SECD machine and call-by- value [12], com- 
bines the best of both worlds: it communicates efficiently by keep- 
ing the size of the message within a small fixed bound and it exe- 
cutes efficiently on each node. In fact, the compilation scheme de- 
generates to that of the conventional Krivine machine if the whole 
program is deployed on a single node. An additional advantage 
which this current technique offers is that, unlike the exotic GOI 
and games-based approaches, it is in some technical sense standard 
so that it can interface trivially with legacy code which was com- 
piled to the Krivine machine. 

Because the formal definitions of the formalism of Krivine Nets 
which we propose can be in places fairly intricate we adopt a 
fully formal approach, expressing all the definitions and the cor- 
rectness proofs in the dependently-typed programming language 
AGDA [27]. This allows us to present technical results with a high 
degree of confidence and to remove all proof details, which can be 
found elsewhere [1], from the paper. In this paper we can focus on 
the exposition. The reader is not assumed to know AGDA in order 
to read the paper, which is self-contained, but a good knowledge 
of the language is required in order to understand the correctness 
proofs. 

2. The Krivine machine 

We are compiling the untyped applied call-by-name lambda cal- 
culus, i.e. a lambda calculus with constants. For the sake of a con- 
crete yet simple presentation we assume that the only data is natural 
numbers, and the constants are numeric literals, arithmetic opera- 
tors and if-then-else. Informally, the grammar of the language is 



M ::=x | Xx.M \ MM | if M then M else M | n 
| M © M | M@A. 

Formally, we define the data-type of terms with the following 
constructors: 

data Term : * where 



A 


: Term — > Term 


JL 


: (t t' : Term) — > Term 


var 


: N -» Term 


lit 


: N -> Term 


op 


: (f : N -> N -> N) (t t ' : Term) -»■ Term 



ifO_then_else_ : (btf: Term) -> Term 
_@_ : Term ~ > Node — > Term 

Above, * is the "type of types". We are using the De Bruijn index 
notation, so abstraction (A_) is a unary operator and each variable 
(var) is a natural number. The value of the index denotes the number 
of binders between the variable and its binder. Function application 
(_$_, an infix operator) is an explicit constructor, for clarity. Nu- 
meric literals (lit) and branching (ifO_then_else_) are obvious, not- 
ing that the constructor for the latter is a mixfix operator. Binary 
arithmetic operators (op) take three arguments: the function giving 
the operation and two terms. 

We also introduce syntactic support in the language (_@_, an- 
other infix operator) for specifying node assignments for closed 
sub-terms. This is done strictly for simplicity. Node assignment 
could be otherwise specified, e.g. using a separate configuration 
file, but it would needlessly complicate the presentation. Node as- 
signment is a "compiler pragma" and has no bearing on observa- 
tional properties of the programming languages. The requirement 
that node assignment is specified for closed terms only keeps the 
presentation as simple as possible. This apparent restriction can be 
easily overcome using lambda lifting. 

EXAMPLE 1 . The term (Xx.Xy.y + x) 3 4 is represented as 
termExample : Term 

termExample = A (A (var 0 +' var 1)) $ lit 3 $ lit 4 
where _+ '_ = op _+_ 

The Krivine machine is the standard abstract machine for call- 
by-name. It has three components: code, environment and stack. 
The stack and the environment contain thunks, which are closures 
representing unevaluated function arguments. The evaluations are 
delayed until the values are needed. For the pure lambda calculus, 
the Krivine machine uses three instructions: 

POPARG pop an argument from the stack and add it to the environ- 
ment. 

PUSHARG push a thunk for some code given as argument. 

VAR look up the argument in the environment and start evaluation. 

For the applied lambda calculus the machine becomes more 
complex because arithmetic operations are strict, so extra mech- 
anisms are required to force the evaluation of arguments. 

Formally, we define closures and environments by mutual re- 
cursion: 

mutual 

Closure = Term X Env 
data EnvEl : * where 

clos : Closure — > EnvEl 
Env = List EnvEl 

The constructor clos that takes a Closure into an environment ele- 
ment EnvEl is only needed for formal reasons, to prevent the AGDA 
type-checker from reporting a circular definition. 
Stacks and configurations are: 
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data StackElem : * where 



arg 
ifO 

OP2 

opi 

Stack 
Config 



Closure 
Closure 



— » StackElem 
Closure — > StackElem 
Closure — y StackElem 

— > StackElem 



List StackElem 
Term X Env X Stack 



The generic stack elements (for function arguments) are con- 
structed using arg, whereas ifO, op2, opi are used by the constants. 

The signature of the Krivine machine is given as a data-type, 
defining a Re/ation on Con/igurations of the Krivine machine: 



data . 



>/C _ : Rel Config Config where 



The relation type Rel A B is defined to be A — > B — > *, so two 
elements a and b are S-related exactly when R a b is inhabited, given 
R : Rel A B. Each rule, i.e. each instruction of the machine, will 
thus correspond to a constructor. We explain the formal definition 
of each rule. 

POPARG : {t : Term} {e : Env} {c : Closure} {s : Stack} -> 
(A t, e, arg c :: s) — > K (r, clos c :: e, s) 

POPARG handles abstractions A t by moving the top of the stack 
arg c into the first position of the environment e. The constructors 
arg, clos are needed for type-checking and would be omitted in 
an informal presentation. The constructor arguments (t, e, c, s) are 
implicit, indicated syntactically in AGDA by curly brackets. 

PUSHARG : {tf : Term} {e : Env} {s : Stack} -¥ 
((t$f),e,s) — >k (f,e,arg (t',e) :: s) 

PUSHARG handles application t $ t' by creating a new closure 
arg (t\ e) and pushing it onto the stack, then carrying on with the 
execution of the function body t. 



VAR : {« : N} {e e' : Env} {t 
lookup n e = just (clos (t, e')) -) 
(\arn,e,s) — > K (t,e',s) 



Term} {s : Stack} 



In AGDA the = operator denotes propositional equality, which 
necessitates a proof, whereas = is used to introduce new definitions. 
The VAR rule looks up the variable n in the current environment e 
and, if successful, retrieves the closure at that position (t, e') and 
proceeds to execute from it, with the current stack. 

Because this is an applied lambda calculus we need additional 
operations for conditionals and operators. Here we omit the types 
of the implicit arguments since they can be inferred: 



K (b,e,HO(t,e) (J\e) :: s) 
•K (t,e',s) 

K (f,e',s) 
s) 



COND : V {btfes} -> 

(ifO b then ( else/, e, s) — 
COND-0 : V{ete'fs}^ 

(Iit0,e,if0 (t,e')f:: s) - 
COND-suc : V {n e tfe' s} 

(lit (1 + n),e,ifO t (/>') :: s) 
OP : V {ftf es} -> 

(op ftt',e,s) — >>c (t,e,op 2 f(t',e) 
OP 2 : V {n efte' s} ->■ 

(litn,e,op 2 /(f,e') :: s) — > K (',e',opi (fn) :: s) 
OPi : V{nc/l}-> 

(litn,e,opi/:: s) — >k (lit (fn), [],s) 



EXAMPLE 2. We can see the Krivine machine at work in this sim- 
ple example. The term in Ex. 1 has the following execution trace, 
written informally as follows: 



((A (A _+_ 0 1) % 3 % 4), [],[]) 
— >( PUSHARG > 

((A (A _ + _0i)$ J!), [],[(*,[])]) 
— >{ PUSHARG > 

(A (A _+_0 J), [],[(*, []),('.[])]) 

— >( POPARG } 
(\-+-01,l(3,[])],[(4,[])]) 

— >{ POPARG > 
(_+_(?;, [(4, []), (3, [])],[]) 

-><op> 

(0,[(4,[]),(3,[])},[op 2 _+_ (1,[(4,[]),(3,[])])}) 

— >{ VAR refl > 
K[],[opa _+_ (J, [(<,[]), (3, [])])]) 

^< OP 2 ) 
(l,[(4,[]),(3,[])],[o Vl (_ + _4)]) 

— >( VAR refl ) 
(*,[], [opi (_+_4)]) 

— »•< OPi > 
(7, [],[]) 

In the above we have omitted the constructors op, var, arg, etc. for 
brevity. 

Finally, we include a (degenerate) instruction for remote execution: 

REMOTE : V {t i e s} -> (t © i, e, s) — > K (t, [], s) 

This instruction is included strictly so that the _@_ construct for 
node assignment does not trigger a runtime error, but it is effec- 
tively a no-op: it simply erases the environment e, since node as- 
signment is meant to be applied only to closed terms. In the follow- 
ing section we will define the distributed Krivine machine, where 
the REMOTE instruction is meaningful. 

3. Krivine nets 
3.1 The machine 

We now extend the Krivine machine so that it supports an arbitrary 
pattern of distribution by letting several instances of the extended 
machine run in a network. We call these machines DKrivine ma- 
chines and they form Krivine Nets. The DKrivine machines extend 
the Krivine machines conservatively by adding new features. Each 
such machine is identified as a node in the network and has a dedi- 
cated heap. A pointer into a heap may be tagged with a node identi- 
fier, case in which it is a remote pointer, which can now be stored in 
the environment along with local closures. The stack may now have 
as a bottom element a remote pointer indicating the existence of a 
remote stack extension, i.e. the fact that the information which logi- 
cally belongs to this stack is physically located on a different node. 
Finally, the configuration of the Krivine machine is now called a 
thread indicating that its execution can be dynamically started and 
halted. Internally, the heap structure is used for storing persistent 
data that needs to out-live the runtime of a thread. The new formal 
definitions are as follows: 

RPtr = Ptr X Node 
ContPtr = RPtr 

data EnvElem : * where 

local : Closure — > EnvElem 
remote : ContPtr — ¥ N — > EnvElem 

Stack = List StackElem X Maybe (ContPtr X N X N) 

ContHeap = Heap Stack 

Thread = Term X Env X Stack 

Machine = Maybe Thread X ContHeap 

The definitions are straightforward, except for the remote envi- 
ronment element and the definition of stacks which require expla- 
nation. A remote ContPtr is a pointer to a continuation stack, and the 
constructor remote takes an additional natural number argument in- 
dicating the offset in that continuation stack where the referred clo- 
sure is stored. As stated, the stack now possibly includes a remote 
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stack extension. This extension is to be thought of as being located 
at the bottom of the local stack, and consists of a ContPtr pointing 
into the heap of a remote node holding the stack, and two natural 
numbers that form the current node's view of that stack. The second 
number is the offset into the remote stack that the view starts from, 
and the first number stores how many consecutive arguments there 
are on on it. 

Because DKrivine machines are networked they exchange mes- 
sages, which fall into three categories, formalised as constructors 
for the Msg datatype: 

REMOTE A message with this tag initiates remote evaluation, for- 
mally defined as 

REMOTE : Term -y Node -> ContPtr -y N -> Msg 

The message consists of a Term, a destination Node identifier, a 
ContPtr to the sender's current continuation stack and a natural 
number indicating how many arguments are on that stack. 

The design decision to make a Term part of the message struc- 
ture is for simplicity of formalisation only. In the actual im- 
plementation only a code pointer needs to be sent to the node, 
which already has the required code available. The mechanism 
through which compiled code arrives at each node is handled 
by a distributed program loader (see e.g. [11]) which is part of 
the runtime system and, as such, beyond the scope of this work. 
It should be obvious that distributed program loading is possi- 
ble in principle here because all code is static and available at 
compile-time. 

RETURN These messages are sent when computation has termi- 
nated and reached a literal, and the value must be returned to 
the node that has initiated the computation. The formal defini- 
tion is: 

RETURN : ContPtr -y N -> N -y Msg 

The message contains a ContPtr to the remote stack of the ma- 
chine that is receiving the message, the natural number calcu- 
lated and another number indicating to the receiving machine 
how many arguments can be now discarded from the stack, cor- 
responding to the offset of sending node's view of the stack. 

VAR is a message used to access remotely located variables. It con- 
sists of a remote ContPtr, an offset into the remote continuation 
stack, a local continuation stack and the number of arguments 
on it. 

VAR : ContPtr -y N -y ContPtr -y N -y Msg 

We need to send the continuation stack of the calling node (like 
in the REMOTE rule) because the remote variable may refer 
to a function, in which case the arguments are supplied by the 
calling node, or it may be part of an operation on the calling 
node, in which case the resulting number needs to be returned 
there once it has been calculated. 

Deliberate in the design of the Krivine nets is the need to minimise 
message exchange. To achieve this, machines do not send remote 
"pop" messages for manipulating remote stack extensions, but per- 
form this operation locally. When a node sends a pointer to a new 
continuation stack it also sends the number of arguments that are 
on that stack, so that the receiving node can pop arguments from its 
local view of that stack. 

We can now start describing the transitions of the DKrivine 
machine. The signature of the transition relation is: 

data _ h — y-pjc (_}_ (( : Node) : 

Machine -¥ Tagged Msg — y Machine — y * 

DKrivine transitions are parameterised by the current node identi- 
fier and map a Machine state and a Tagged Msg into a new Machine 



state. The tag applied to the message indicates whether the message 
is sent, received or absent (i.e. a silent transition): 

data Tagged (Msg : *) : * where 
t : Tagged Msg 

send : Msg —y Tagged Msg 
receive : Msg — > Tagged Msg 

All the old rules are present, but now expressed in the presence 
of the continuation heap. 

POPARG : V {t e c s r ch} -> 

i h (just (A t, e, arg c :: s, r), ch) — > P K ( T > 
(just (f, local c :: e, s, r),ch) 

Compared to the POPARG rule of the original machine, the only 
differences are the tag on the configuration (just ...), which ex- 
presses the fact that the DKrivine thread is running, and the con- 
tinuation heap ch which remains constant during the application of 
this rule. The environment element constructor local now empha- 
sises that the variable is local. Because the transition involves only 
one node it is t, i.e. no messages are exchanged. 

The other old transition rules are embedded into the DKrivine 
machine in a similar way. They are all silent and the continuation 
heap ch stays unchanged: 

PUSHARG : V { tt' e s r ch} -y 

i h (just ((t $ t'),e,s, r),ch) — >j, K (r) 
(just (t, e, arg (f ', e) :: s, r), ch) 
VAR : V { n e s r ch t e ' } -y 

lookup n e = just (local (t, e')) —y 
i r- (just (var n, e, s, r),ch) — >-dk{t) 
(just (t,e',s,r),ch) 
COND : V {btfesrch} -y 

i h (just (ifO b then t else /, e, s, r), ch) — >t>k ( T ) 
(just (b, e, ifO (t, e) (f, e) :: s, r),ch) 
COND-0 : V {ete'fsrch} -y 

i h (just (lit 0, e,m (t,e')f:: s,r),ch) — >p K (t) 
(just (t,e',s,r),ch) 
COND-suc : \/ {ne t e'fs r ch} -y 

i Y- (just (lit (1 + n),e,itO t (/>') :: s,r),ch) — >p K (t) 
(just (f,e',s,r),ch) 
OP : V {ftf esrch} -> 

it- (just (opftf,e,s,r),ch) — >t>k{t) 
(just (t,e,op 2 f(t',e) :: s,r),ch) 
OP2 : V {n ef t e' s r ch} -y 

i h (just (lit n,e,op 2 f(t,e') :: s,r),ch) — y-o K (r) 
(just (f,e',opi (fn) :: s,r),ch) 
OPi : V {ne f s r ch} -y 

i h (just (lit n, e.opi/:: s,r),ch) — >j, K (r) 
(just (lit (fn),[],s,r),ch) 

The REMOTE execution rule is now meaningful, and it has a 
send and a receive version: 

REMOTE-send : V {t V e s ch} -y 

let (ch', kp) = i h ch > s in 
i h (just (t @ i',e, s),ch) 

— >t>k (send (REMOTE t V kp (num-args s))) 

(nothing, ch') 

The operation i t- ch > s signifies allocating at node i in heap ch a 
new pointer pointing at stack s, and it returns a pair of the updated 
heap ch' and the newly allocated remote pointer kp. The remote- 
execution directive t @ V is carried out by sending a REMOTE 
message to V consisting of the (pointer to) code t, the destination V, 
the local continuation-stack pointer kp and the number of arguments 
on it. After sending the remote execution message the thread halts, 
i.e. its state is nothing. 

The function that calculates the number of arguments on the 
stack is quite subtle and we give its formal expression below: 



352 



num-args : Stack 




->■ N 


num-args ([] 


, nothing) 


= 0 


num-args ([] 


Just (_,n, 


-)) = n 


num-args (arg _ :: s 


,0 


= 1 


num-args (ifO :: _ 


■ ,-) 


= 0 


num-args (0P2 


-,-) 


= 0 


num-args (opi :: 


,-) 


= 0 



+ num-args (s, r) 



The function returns the number of arguments at the top of the 
stack, but it takes into account the possibility that some arguments 
are local and some arguments are remote. Recall that the remote 
pointer that we store at the bottom of the stack, pointing to the re- 
mote stack extension, also has a natural number numargs expressing 
how many arguments are stored remotely. This is an important op- 
timisation because it makes it possible for this function to be evalu- 
ated locally, without querying the remote machine where the stack 
extension is physically located. 

The counterpart REMOTE-receive rule is: 

REMOTE-receive : V {chtkp numargs} -> 
i h (nothing, ch) 

— >T»c (receive (REMOTE t i kp numargs) ) 
(just (?,[],[], just (kp, numargs, 0) ) , ch) 

The thread on node i is halted when it receives the REMOTE 
execution message, with the same contents as above. The code t 
becomes the currently executed code, in an empty environment (t is, 
as we explained before, closed) and empty stack remotely extended 
by kp to the originating machine stack. 

Additionally, some of the original rules now have send and 
receive counterparts to handle the situation when remote variables 
or continuations need to be processed. Remarkably, it is possible to 
avoid sending messages when popping a remote argument, and we 
can get by with the following new instruction: 

POPARG-remote : \/ {t e kp args m ch} ^ 
i h (just (A t, e, [], just (kp, 1 + args, m)),ch) 
>T)K <T> 

(just (t, remote kp m :: e, [],just (kp,args, 1 + m)),ch) 

Note that this is a silent (r) transition. A machine does not really 
"pop" the arguments of a remote stack extension but changes its 
view of this remote stack. This avoids instituting a whole class of 
messages for stack management and it also gives a more robust 
stack management framework in which stacks, along with heaps 
and any other data structures involved, are only changed locally. 

This rule is triggered when a POPARG action encounters a local 
empty stack, which means that the remote stack extension needs to 
be used. Just like in the case of a local POPARG, the environment 
is updated, but this time with the remote pointer kp which has its 
offset set at m. The offset in the view of the remote stack extension 
is updated (to 1 + m) to reflect the fact that another argument has 
been "popped". 

The rules that need genuine remote counterparts are VAR, for 
accessing remote variables, and RETURN, for returning a literal 
from a remote computation. 

VAR-send : V{»es rkp index ch } — ► 
lookup n e = just (remote rkp index) — > 
let (ch', kp) = i h ch > s in 
(' h (just (var n, e, s), ch) 

— ^T>K (send (VAR rkp index kp (num-args s))) 
(nothing, ch') 

The rule is triggered when the machine detects a remote pointer 
in its environment e. Just like in the case of the REMOTE instruc- 
tion, the current continuation stack is saved in the continuation heap 
of the machine i, at address kp. The machine then sends a VAR- 
tagged message onto the network, with the structure discussed be- 
fore, and halts, i.e. its thread is nothing. Note that the left-hand- side 
of the transition triggered by the VAR-send rule is almost the same 
as that of the local VAR rule. 



Upon receiving a VAR message, a (halted) machine executes the 
VAR-receive instruction: 

VAR-receive : V {eft kp s n rkp m el} — > 
ch ! kp = just s — > 
stack-index s n = just el — > 
i h (nothing, ch) 

— >x>K (receive (VAR (kp, i) n rkp m) ) 
(just (var 0, el :: [], [], just (rkp, m, 0)),ch) 

The right-hand-side of the VAR-receive rule introduces a new vari- 
able var 0, perhaps surprisingly. In order to avoid having special 
cases where the retrieved variable index is itself either local or re- 
mote, we create the dummy variable var 0 referring to the vari- 
able pointed-to by the received VAR message. This is what the 
stack-index : Stack — » N — ► Maybe EnvElem function, invoked on 
the stack that kp points to, achieves. If the stack element at index 
n in the stack is a local argument, then it returns that closure as a 
local environment element. If the element at index n refers to an 
argument on the remote stack extension, it returns a corresponding 
remote environment element. Afterwards we can use the existing 
local VAR or VAR-send rules depending on whether the variable is 
local or remote also to this node. 

RETURN-send : V {n e kp m ch} -> 
/ h (just (lit n, e, [], just (kp, 0,m)), ch) 
— >x>K (send (RETURN kp n m) ) 
(nothing, ch) 
RETURN-receive : V {ch kp s s' n m} — > 

ch ! kp = just s — > drop-stack s m = just s' — » 
/ h (nothing, ch) 

— >t>k (receive (RETURN (kp, i) n m) ) 
(just (lit n, [],*'), ch) 

Finally, the RETURN-send and RETURN-receive rules are trig- 
gered when a machine has reached a literal and has a remote stack 
extension without any arguments, implying that the remote stack 
is either empty (i.e. it is located at the root node of the whole ex- 
ecution) or it has a continuation requiring a natural number literal. 
In both cases we want to send the literal back to the node where 
the stack is located. The one thing to notice is that the message in- 
cludes the number m to be used by the receiver to drop the correct 
number of elements from the top of the stack. This is handled by 
the drop-stack function, defined as follows: 

drop-stack : Stack — > N — > Maybe Stack 

drop-stack (s, r) 0 = just (s, r) 

drop-stack ([], just (_,0, _)) (1 + _) = nothing 

drop-stack ([],just (kp, 1 + n,m)) (1 + i) = 

drop-stack ([],just (kp,n, 1 + m)) i 

drop-stack ([], nothing) ( 1 + _) = nothing 

drop-stack (arg _ :: s,r) (1 + i) = drop-stack (s, r) i 

drop-stack (_::_, _) (1 + _) = nothing 

As in the case of num-args the function may change the local 
view of a remote stack extension, without requiring further message 
exchanges between nodes. If not enough arguments are on the 
stack the function returns nothing, which should not happen during 
a normal execution since we take care to keep the stack views 
consistent. 

3.2 The network 

We consider two kinds of networks, either based on synchronous 
message passing (blocking send) or asynchronous message passing 
(non-blocking send). The two definitions are: 

SNet = Node —¥ Machine 

AsN et = (Node — ► Machine) X List Msg 

The way we model the asynchronous network is inspired by 
the Chemical Abstract Machine (CHAM) [3]. The network is, 
in addition to a family of machines indexed by Node identifiers, 
a global multiset of messages List Msg in which sent messages 
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data _ — >g _ (nodes : SNet) : SNet — > * where 

silent-step : V {i m'} — »• (i H norfe.s i — > (t) m') — » nodes — >s update nodes i m' 

comm-step : V {.v r msg sender' receiver'} — > 
let nodes' = update nodes s sender' in 

(s h nodes s — > (send msg) sender') — > (r h nodes' r — > (receive msg) rece(ver') — > nodes — >s update nodes' r receiver' 

data _ — _ : AsNet —¥ AsNet — > * where 
step : V { nodes } msgsl msgsr { tmsg m' i } — > 
let (msgin, msgoui) = detagtmsg'm 

(i h nodes i — > (tmsg) m') — > (nodes, msgsl 4f msgin 4f msgsr) — >As (update nodes i m' , msgsl -H- msgout 4f msgsr) 



Figure 1. Network transitions 



are placed, and from which received messages are retrieved. The 
formal definitions are given in Fig. 1. 

In the SNet messages are passed directly between machines. 
Network transitions are either a silent-step when a node makes a 
t transition, or comm-step when two nodes exchange information. 
The AsN et only has a generic step, because no synchronisation is 
needed. A machine on a node may take a r step or a communication 
step, case in which a message is placed or removed from the global 
set of messages. The function detag figures out what messages a 
node is sending and receiving, allowing one rule for all three cases, 
as at most one of msgin and msgout in the rule is non-empty: 

detag : {A : *} — > Tagged A — > List A X List A 
detag t =[]>[] 
detag (send x) =[]>[■*] 
detag (receive x) = [x],[] 

Another helper function used in the definitions is update, which 
updates the state of a node in the network. It is the usual function 
update, commonly written as (/ | x y), here relying on the 
assumption that the set of node identifiers has decidable equality 
(_=_). It is formally defined as: 

update : {A : *} — > (Node — > A) — > Node -» A — > Node —>■ A 

update nodes nmn' with n' = n 
update nodes nmn' \ yes _ = m 

update nodes nmn' \ no _ = nodes n ' 

In AGDA, the with keyword introduces patterns additional to the 
arguments in a function definition. 

The definition of network transitions is parameterised by a ma- 
chine transition relation h — > (_) _, which is subsequently 
instantiated to — >t>k, and initialised by starting from a designated 
node i with code t and all other constituents empty. 

open import Network Node _= h — >tjk (_)_ public 

initial-networks '■ Term — > Node —¥ SNet 
initial-networkg t i = 

update (A V -> (nothing, 0)) i (just (t, [], [], nothing), 0) 

initial-network A a : Term Node — > AsN et 
initial-networkj\^ a c i = initial-networks c i, [] 

It is immediate to show that a SNet can be represented by 
the more expressive AsNet. This is the function mapping a Sync 
transition to an Async one by placing, then removing, the message 
in the global message pool (here _ + takes the transitive closure of a 
relation, constructed with list-like notation): 

Sync-to-Async + : V {a b} — > (a — >s b) —> 

(«,[]) ^J s (MI) 

Sync-to-Async + (silent-step s) = [step [] [] s] 
Sync-to-Async + (comm-step .si S2) = step [] [] si :: [step [] [] s%] 

The other direction is not as trivial, and is formalised by the 
following lemma, stating that whenever some DKrivine machines 
can make an Async transition with the global pool of messages 
remaining the same (empty, for simplicity), the same transition 
could be made in a SNet: 



Async* -to-Sync* : V { nodes nodes '};'—> 

all nodes except i are inactive — > 
((nodes, []) — >+ s (nodes', [])) -> 
nodes — >J nodes' 

Async* -to-Sync* = Async* -to-Sync* -lemma refl refl 

The proof is an immediate application of a more complex lemma 
which is omitted from this presentation. In contrast to the Sync-to-Async* 
embedding, this embedding is specific to DKrivine machines. More 
precisely, two properties of these machines make this possible: The 
first one is that the DKrivine machines halt after each message send 
and receive only from halting states. The second one is that they 
are deterministic. Intuitively, it is fairly clear that the two styles of 
communication are equivalent under these circumstances. 

These two results about Krivine Nets are interesting because 
they show that we do not need to commit to a synchronous or asyn- 
chronous network of DKrivine machines since they are equivalent. 
We may therefore use whichever is more convenient for correct- 
ness proofs in the knowledge that the properties we prove transfer 
immediately to the other one. 

3.3 Example 

Let us compare briefly the execution of a rather simple term, 

: : A/..\.r. /./::<> 0 

on a single machine and on a distributed machine. The program is 
located on (the default) node A, except for Xf.Xx.f x which is on 
node B. This program is similar to our introductory example in that 
it does a remote function call, and additionally shows that higher- 
order remote function calls are also possible. 

As we discussed earlier, the Krivine machine ignores the @ 
construct (the REMOTE rule is a no-op), producing the execu- 
tion trace PUSHARG; PUSHARG; REMOTE; POPARG; POPARG; 
PUSHARG; VAR; POPARG; VAR; VAR, which leaves the machine 
in state (lit 0, [],[]). 

The Krivine Net of two nodes produces the following trace (in- 
formally, indicating machine state only when interesting). Node A 
starts with PUSHARG; PUSHARG; REMOTE-send, which produces 
message 

REMOTE (A (A (var 1 $ var 0))) B (ptn,A) 2 

where ptr\ points to the stack ([(A var 0, []), (0, [])], nothing). 

Node B receives the message and executes REMOTE-receive; 
POPARG-remote; POPARG-remote; PUSHARG; VAR-send, which 
produces message VAR (ptn,A) 0 (ptr^,B) 1, where ptr2 points 
to the stack 

[(var 0, [remote (ptr\,A) 1, remote (ptr\,A) 0])],just ((ptr\,A),0, 2) 

Note that the two traces are essentially the same, except for the 
REMOTE rule becoming meaningful. As we explained before, the 
POPARG-remote rule only changes the local view of the remote 
stack extension and generates no communication overhead. Also 
note that the stack at ptr^ extends remotely to the stack at ptr\ and 
uses it in its own stored closures. 
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Heap A Heap B 



Figure 2. Final heap 



The rest of the dialogue is as follows: 

Node A : VAR-receive; VAR; POPARG-remote; VAR-send 

Node B : VAR-receive; VAR; VAR-send 

Node A : VAR-receive; VAR; RETURN-send 

Node B : RETURN-receive; RETURN-send 

Node A : RETURN-receive; RETURN-send 

Node B : RETURN-receive; RETURN-send 

Node A : RETURN-receive 

Compared to the Krivine trace, the VAR instructions is here 
broken into a send and receive version if the requested variable is 
remote. There is also the additional VAR rule needed to avoid a case 
statement on whether a variable is local or remote. The RETURN 
instructions are new, required to forward computed values to the 
caller. 

After the execution, the heaps of the two nodes are: 

A : {ptn h-> ( [arg ( A var 0, [ ] ) ; arg (lit 0, [ ] ) ] , nothing) , 

ptr 3 ^ ([],just((ptr 2 ,B),0,l))} 
B : {ptr2 i-t ([arg (var 0, [remote (ptn, A) 1; remote (ptn, A) 0})], 
just ((ptn, A), 0,2)), 

Ptn >-> ([],just ((ptr 3 ,A),0,0))} 

A graphical representation of the final heap is in Fig. 2, with stack 
extension pointers in black and remote variables in grey. 

Unlike the Krivine machine, the Krivine Nets will result in non- 
empty heaps (garbage) in the individual DKrivine machines. We 
will discuss how to deal with this in the conclusion. 

4. Correctness 

We prove the correctness of the DKrivine Net by exhibiting a sim- 
ulation between the conventional Krivine machine and a Krivine 
Net. The simulation is then used to prove the following Soundness 
theorems: 

termination-agrees s : V cfg nodes n — > R$ cfg nodes — > 
cfg 4k lit n — > nodes ^ s lit n 

divergence-agrees s : V cfg nodes — ► R$ cfg nodes — ► 
cfg tic nodes f s 

The termination theorem states that for any Krivine machine con- 
figuration cfg and any Krivine Net configuration nodes, if we have 
a simulation relation R$ between them then for any literal n, if the 
Krivine machine starting from cfg produces the literal n, then the 
Krivine Net starting from configuration nodes produces the same. 
Note that we are using Sync nets, because they are more conve- 
nient and because Async nets can be reduced to Sync nets in the case 
of Krivine Nets, as discussed in Sec. 3.2. The divergence theorem 
makes a similar point about non-termination: from related states, if 
the Krivine machine diverges then the Krivine Net diverges. 



4.1 The simulation relation 

The most important ingredient of the correctness proof is defining 
and exhibiting the appropriate simulation relation. At the top level, 
the relation between the Krivine machine and Krivine Nets config- 
urations is defined formally as follows: 

R s ■■ RelConfigSNet 
Rg cfg nodes = 3 A i — > 

all nodes except i are inactive X 

RMachine (proj 2 o nodes) cfg (proji (nodes (')) 

In AGDA notation the existential statement 3i.P(i) is written 
3 A (' — ► P i. The predicate all_except_are_ is defined as 

all f except x are P = V*'- > x' ^ x — > P (fx') 

and inactive node holds when the thread of node is nothing. A 
simulation between machine and net configurations exists only 
when precisely one node i is active in the net. The machine at 
that node (proj\ (nodes i)) must be related to the configuration 
of the Krivine machine through the following machine-simulation 
relation: 

RMachine '■ Heaps — > Rel Config (Maybe Thread) 

RMachine hs (fl,61, Si) (just (?2,e2,S2)) = 

Rrerm ti t 2 x Re„ v hs e\ «2 X (] J rank — > Rstack fank hs si S2) 
R M achine hs (h, ex, sx) nothing = ± 

The relation is indexed by the distributed heap of the Krivine Net 
hs : Heaps, which is the AWc-indexed family of all the individual 
heaps. This relation R Ma chine simply distributes the relation further 
to terms using R Ter m, environments using R Env and stacks using 
Rstack- In order for this to be possible it is required that the DKrivine 
machine is not halted (nothing : Maybe Thread). 

On terms, the relation Rrerm is just propositional equality, while 
REnv and Rstack are more subtle and require a non-trivial proof 
technique. Rstack is similar to a step-indexed relation [2] on stacks. 
It is defined by induction on a natural number rank in order to ensure 
that the cascading remote stack extensions do not have any cycles. 
Unlike a step-indexed relation, rank means that we do exactly rank 
remote-pointer dereferencings in the process of relating two stacks, 
and Rstack requires that this number is known. R Em >Eiem, used by R E „ V 
to relate environment elements, is defined by induction on a rank for 
the same reason. 

4.1.1 Relating environments 

On environments, the formal definition of the relation is: 

REnv : Heaps — ¥ Rel Krivine.Env DKrivine.Env 
R E nvhs[] [] = T 

Re™, hs [] (x 2 :: e 2 ) = ± 

REnv hs (xi :: ei) [] = ± 

Re„v hs (xi :: e\) (x 2 :: e 2 ) = 

(3 A rank — > REnvElem ran k hs x\ x 2 ) X i?£„ v hs e\ e 2 

Empty environments are trivially related, but environments of dif- 
ferent shapes cannot be related. If both environments are non- 
empty then the definition is inductive on the structure of the envi- 
ronment. Environment elements are related by requiring that there 
exists a rank such that they are related by R En vEiem '■ 

REnvElem '■ N — > Heaps —> Rel Krivine.EnvElem DKrivine. EnvElem 

REnvElem 0 hi (cloS Cl) (local C 2 ) = Rciomre hs C\ C 2 

REnvElem (1 + rank) hs (clos ci) (local c 2 ) = ± 
REnvElem 0 hs (clos ci) (remote contptr index) = _L 

REnvElem (1 + rank) hs (clos ci) (remote contptr index) = 
stack-ext-pred hs contptr 

(A S2^3A ee 2 —¥ stack-index s 2 index = just ee 2 
X REnvElem rank hs (clos ci) ee 2 ) 

Local closures of the DKrivine machine relate to closures of 
the Krivine machine (Rciomre) if their terms are equal and their 
environments are related through R En v Relating remote closures 
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(remote contptr index) of the DKrivine machine to the closures of 
the Krivine machine (clos ci) is perhaps the most subtle part of the 
definition. It uses the following helper function which ensures that, 
given a distributed heap hs : Heaps, a remote pointer (ptr, loc) : 
ContPtr and a predicate on distributed stacks DKrivine. Stack — > *, 
the pointer points to a stack in the heap of node loc such that the 
predicate holds: 

stack-ext-pred : Heaps — > ContPtr — > (DKrivine. Stack —»★)—»★ 
stack-ext-pred hs {ptr, loc) P = 3As-> hs loc ! ptr = just s X P s 

The pointer dereferencing operation is hs loc ! ptr. The predicate 
which we use in the definition of R E „vEiem is that there exists an 
element ee 2 in the environment of the DKrivine machine such that 
it REnvEiem relates to the Krivine closure clos ci in one less step. 

Note that the rank has to be 0 to relate local elements, and it 
has to be 1 + rank to relate a remote element. The recursive call 
is done with the predecessor rank, which makes sure that there are 
exactly rank pointers to follow to reach a local closure if we have 
an element of REnvEiem rank hs x\ x 2 . 

4.1.2 Relating stacks 

Relating stacks is somewhat similar. 

Rstack '■ N — > Heaps Rel Krivine. Stack DKrivine. Stack 
Rsmckrank hs (*i :: s{) ([], nothing) = ± 

Rstack rank hs [] (x2 : : S2 , r) = _L 

Rstack 0 hs [] ([], nothing) = T 

Rstack (1+ rank)hs[] ([], nothing) = _L 

Rstack rank hs (x\ :: si) (x2 " S2,r) = 

RstackEkm hs x\ x 2 X R Stack rank hs si (.52,'') 
Rstack 0 hs si ([}, just (contptr, args, drop)) = _L 

Rstack (1 + rank) hs si ([],just (contptr, args, drop)) = 

stack-ext-pred hs contptr (X S2 —> 

3 X ds2 — > drop-stack S2 drop = just ds2 

X num-args ds2 = args X Rstack rank hs s\ £/.v 2 ) 

Empty stacks, with no remote extensions, are related if the rank is 
0, whereas empty and non-empty are not. Two non-empty stacks 
are related if the elements on top are related by R EnV Eiem and the re- 
maining stacks are related. The relation is interesting when remote 
pointer extensions are involved. If there is a remote stack extension 
but the step index is 0 then it cannot be related to a Krivine stack. 
If there is a non-zero step index then, using the same helper func- 
tion stack-ext-pred, we require that the sub-stack ds2 of S2 obtained 
by dropping the drop arguments required by the remote stack exten- 
sion pointer just (contptr, args, drop) is related to the Krivine stack 
.vi using a smaller (by one) index. 

Finally, stack elements are related if they have the same head 
constructor, and the constituents are related: 

RstackElem '■ Heaps — ¥ Rel Krivine. StackElem DKrivine. StackElem 



RstackEkm hs (arg ci) (arg c 2 ) 

RstackElem hs (ifO Cl Cl ') (ifO C 2 C 2 ') 

RstackElem hs (op 2 /ci) (op 2 g C 2 ) 

RstackElem hs (opi f) (opi g) 

RstackElem hs _ 



Rdomre hs C\ C2 
Rciomre hs Cl C 2 X 
Rdosure hs C 1 ' C 2 ' 
f= g x 

Rdosure hs Cl C 2 
f=g 



4.2 Proof outline 

In order to prove the main property we need to first establish the 
monotonicity of all the heap-indexed relations relative to heap in- 
clusion: if two configurations, machines, environments, environ- 
ment elements or stacks are related in a family of heaps hs they are 
also related in any larger family of heaps hs C s hs ' . The properties 
are proved in a module parameterised by the heap inclusion prop- 
erty, and therefore it does not need to be included in each statement 
- it is a background assumption: 



module HeapUpdate (hs hs' : Heaps) (inc : hs C s hs') where 
envelem : V rank el el' REnvEiem rank hs el el' 
— » REnvEiem rank hs ' el el' 
env : Vee'-» Retki hs e e ' — > Re„ v hs' e e' 

stackelem : V el el' — ^ RstackElem hs el el' 

RstackElem hs' el el' 

stack : V rank s s ' — > Rstack rank hs s s ' 
— » Rstack rank hs ' s s' 
machine : Vfff/n-> R/Hachine hs cfg m 

-> RMachine hi' cfg m 

The proofs are largely straightforward, inductively on the structure 
of the data structure the lemma is concerned with. The key auxiliary 
property that makes monotonicity of the relations true is the fact 
that any predicate which relies on heap dereferencing is preserved: 

s-ext-pred : V contptr {P Q} (V s -> P s Q s) -> 
stack-ext-pred hs contptr P —> stack-ext-pred hs ' contptr Q 

For example, for environments, environment elements and closures 
the proofs are mutually recursive, inductive on their structures: 

closure : V C C ' — > Rdosure hs C c' — > Rdosure hs ' c c' 
envelem : V rank el el' — > REnvEiem rank hs el el' 

— > REnvEiem rank hs' el el' 
envelem 0 (clos c) (local c ') Rcc ' = closure c c ' Rcc ' 

envelem (1 + rank) (close) (local c ') Rcc ' = Rcc' 
envelem 0 (clos c) (remote contptr index) ReleV = ReleV 

envelem (1 + rank) (clos c) (remote contptr index) ReleV = 

s-ext-pred contptr f ReleV 

where 
/ :Vs-> 

(3 A ce' -> stack-index s index = just ee' 

X REnvEiem rank hs (clos c) ee ') — » 

3 X ee' — > stack-index s index = just ee' 

X REnvEiem rank hs ' (clos c) ee' 
fs (ee', si, Rcee') = ee', si, envelem rank (clos c) ee' Rcee' 

env : V e e ' — > Re K v hs e e' — > Re„v hs' e e' 
env [] [] Ree' = Ree' 

env [] (x :: e') Ree' = Ree' 

env (x :: e) [] Ree' = Ree' 

env (x :: e) (x' :: e') ((rank, Rxx') , Ree') = 

(rank, envelem rank x x ' Rxx ') , env e e ' Ree ' 

closure (t,e) (t' ,e') (RtV , Ree') = Rtt' , env e e' Ree' 

The soundness theorem termination-agrees s stated at the begin- 
ning of this section follows directly from two important lemmas, 
simulations an d termination-return. The former is the main technical 
result of the paper (soundness is merely a corollary of it) and the 
latter is used to handle the only non-trivial case of the soundness 
proof, that of cascading RETURN statements at the end of an exe- 
cution. 

The theorem 



simulation^ '■ Simulation . 



Rs 



states that R s , discussed in the previous sub-section, is a Simulation 
relation between the — >ic and — transition relations. The sim- 



ulation relation is defined in the standard way, where 
are transition relations: 



and 



Simulation : (_ 
Simulation _R_ 



: RelAB) -> * 
Vafl'ii-> (a — 
3Xh' -> (b — 



a') — > a R b - 
' b') X a'Rb' 



The proof of simulations is lengthy but largely routine. The non- 
trivial cases are: 

• RETURN actions of the DKrivine machines, which are handled 
by the lemma simulation-return: 
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simulation-return : V n e s cfg' e' s' i nodes srank conth — > 
let cfg = (\\tn,e,s) 
hs = proj2 o nodes 
in cfg — > K cfg' 

all nodes except i are inactive — » 
notfev (' = just (lit n, e', s'), conth — > 
^Stoct sranA: fess'-> 3 A rcorfe.s ' — » 
nodes — >g nodes' X fig cfg' nodes' 

• VAR remote actions of the DKrivine machine, which are han- 
dled by the lemma simulation-var: 

simulation-var : V t e s n e' s' nodes i conth el — ¥ 
let hs = proj2 o nodes in 

(3 A rank — » REnvElem rank hs (clos (t, e)) el) — > 
(3 A rank — > Rstack rank hs s s') — > 
all nodes except i are inactive — > 
nodes i = just (var n, e', s'), conth — > 
lookup n e' = just el — » 

3 A nodes' — » (nodes — >g nodes') X Rs (t,e,s) nodes' 

What is interesting about these two lemmas, which establish the 
conditions under which the simulation relation is preserved by 
transitions related to the integer operations and VAR rules, is that 
it requires a different proof technique, induction on the rank. This 
is because the distributed machine may need to perform a cascade 
of returns (or variable accesses) between different nodes before it 
reaches a configuration related to that of the Krivine machine, as 
we saw in the example in Sec. 3.3. 

The termination-return lemma mentioned earlier uses a similar 
proof technique (induction on the rank); its full statement is: 

termination-return : V n e' s' i nodes srank conth —> 
let hs = proji o nodes 

in all nodes except i are inactive — > 
nodes i = just (lit n, e', s'), conth — ¥ 
Rstack srank hs [] s' — > nodes \. s lit n 

The second part of the soundness proof is the agreement on 
divergence between the Krivine machine and the Krivine net. This 
proof relies essentially on the fact that a Krivine Net transition is 
deterministic whenever only one node is active and that the Krivine 
machine transition's codomain is decidable in the following sense 
(proofs are omitted): 

_is-deterministic-at_ : {A B : *} (R : Rel A B) (x : A) — > * 
_R_ is-deterministic-at a = V {b b'} — > a R b —¥ a R b' — > b = b' 

determinisms '■ V nodes i —¥ all nodes except i are inactive — > 
_ — >5 _ is-deterministic-at nodes 

Js-decidable : {A B : *} (_R_ : Rel A B) -> * 
_R_ is-decidable = Va-> Dec (3 A b — > a R b) 

decidable x. '■ _ — >K — is-decidable 

To conclude this section, we also need to show that initial 
configurations are related so that we have a starting point for the 
simulation. This is easy to prove since the environments and stacks 
are empty: 

initial-related^ ■ V t root — > Rg (t, [],[]) [initial-networks f foot) 

5. Proof of concept implementation 

We have implemented a prototype compiler for Krivine Nets [1]. 
Except for the _@_ directive, compilation to Krivine Nets is im- 
plemented by using the same standard compilation scheme used 
to compile to Krivine machines. It is the runtime system of the 
DKrivine machine that takes into account whether pointers are lo- 
cal or remote and behaves in the correct way. The _@_ directive is 
translated directly into a predefined REMOTE bytecode instruction, 
which constructs and sends a REMOTE message at runtime. As we 
said, we avoid sending code by grouping fragments of output code 



that correspond to the same node, and compiling each group as a 
separate binary. The fragment of code that corresponds to t inside a 
subterm t @ A is assigned, at compile-time, a global identifier that 
an invoking node can use to activate t on node A, meaning that no 
actual code has to be sent at runtime. 

The "bytecode" of the Krivine machine is translated into C 
functions, and message passing is implemented using MPI. The aim 
is not efficiency as much as simplicity. The compiler is not certified 
or extracted from the proofs, so we choose an implementation that 
is, as much as reasonably possible, "clearly correct." 



5.1 Comparison with GOI and GAMC 

A principled comparison with the GOI and GAMC compilers, 
which also follow the methodological principles of seamless com- 
pilation, is difficult because it cannot be a precise like-for-like com- 
parison. We summarise the differences between the three imple- 
mentations below. 

Krivine Nets and GOI implement call-by-name PCF, but not in 
the same way. Krivine Nets implement the type-free language and 
recursion is dealt with using a Y combinator in the source program- 
ming language, which is inefficient. On the other hand, the GOI 
compiler uses a specialised fixpoint constant but it also requires 
specialised machinery to handle variable contraction. So there are 
various contingent sources of inefficiencies which straightforward 
but laborious optimisations could remove. 

The GAMC compiler, on the other hand, implements a much 
larger language: a typed applied CBN lambda calculus with muta- 
ble references and concurrency. It is also tidier than the Krivine Net 
approach, in that it explicitly deallocates useless memory. These 
features require a significant amount of overhead, some of which 
already is present in the DKrivine infrastructure but some of which 
will need to be subsequently added. 

However, there are some features that make the comparison of 
the three compilers meaningful. The first one is that the programs 
we use are virtually identical and the fact that they all use CBN 
means no language-level considerations come into play. The sec- 
ond one is that all three compilers are written as representations of 
the semantic model of the language, with a similar level of disre- 
gard for optimisations against a similar level of concern for "obvi- 
ous" correctness. The third one is that all three target C and MPI, 
meaning that benchmarks can be run on the same computer. 

With these significant caveats in mind we will attempt a rough 
performance comparison of the three compilers in several ways. 
Our benchmarks are small programs operating on integers: 

arith: Computing the sum of applying a complicated integer 
function to the numbers in the sequence 0, . . . , 299. 

fib: Computing the 10th Fibonacci number (using the exponential 
algorithm) 100 times and taking the sum. 

root: Compute the (integer) root of a polynomial using 20 itera- 
tions of the bisection method. 



Krivine baseline. We take the classic Krivine machine as a refer- 
ence point and run the compilers in a degenerate single-node mode. 
This gives a rough measure of the overall overhead of the compiler 
before communication costs even come into play. In the case of the 
GOI compiler the overheads are mainly due to the implementation 
of contraction, whereas in the case of GAMC they are due to the 
large amount of heap allocation and deallocation. However, note 
that the discrepancy would be even greater if we had a fixpoint op- 
erator in the Krivine machine instead of relying on the term-level 
Y combinator. 
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arith fib root 

Baseline" 100% (0.34s) 100% (0.094s) I 100% (0.009s) 
GOI 3,042% (10.3s) 2,832% (2.7s) 20,222% (0.18s) 
GAMC 765% (2.6s) 395% (0.53s) 356% (0.032s) 
DKrivine| 131% (0.44s) 141% (0.13s) | 233% (0.021s) 

Single node baseline. We measure each compiler using its own 
single-node performance as a reference point and we split the 
program in two nodes such that a large communication overhead 
is introduced. We measure it both in terms of relative execution 
time and in terms of average and maximum size of the messages, 
in bytes. Note that the overheads are only due to the processing 
required by the node to send and receive the nodes and not due to 
network latencies. In order to factor them out we run all the (virtual) 
MPI nodes on the same physical computer. 

The data is shown in Tab. 1 and we can see that the DKrivine 
compiler is not only faster for local execution, but also has a com- 
paratively small communication overhead. Each time entry in the 
table is relative to the same compiler's local execution time, and 
the absolute time is shown in parentheses. We can see that DKriv- 
ine is well ahead of the others in terms of absolute execution time. 
Both GAMC and DKrivine use messages of a bound size, whereas 
GOFs messages grow, sometimes significantly, during execution. 
The high overhead across all three compilers for the root bench- 
mark is because it does a relatively small amount of local compu- 
tations before it needs to communicate. We suspect that the high 
overhead for GOI and GAMC in many benchmarks is also due to 
the large amount of "bookkeeping" C code that is required, even 
for simple terms. The way the C compiler optimiser works plays an 
important role in the performance gap between single node and dis- 
tribution. When all the code is on the same node the functions are 
aggressively inlined because they belong to the same binary out- 
put. When the code is distributed this is no longer possible. Also, 
an analysis of the produced code shows that the C optimiser gen- 
erally struggles with the code for the distributed nodes, because it 
does not have a view of the whole program. 

6. Previous and related work 

Programming languages and libraries for distributed and client- 
server computing (which can be seen as a particularly simple form 
of distribution) are a vast area of research. Relevant to us are 
functional programming languages for distributed execution, and 
several surveys are available [21, 32]. 

Functional programming languages for distributed systems take 
different approaches in terms of process and communication man- 
agement. Languages such as ERLANG, which are meant for system- 
level development offer a fairly low-level view of distribution in 
which both process and communication are managed explicitly; 
this is the language we used for contrasting effect in the intro- 
duction. To tame communication some languages in this category 
use mechanisms imported from process-calculi, such as PICT [33]. 
Programming languages do not need to be created from scratch 
to include improved language support for communication. Session 
types have been used to extend a variety of languages, including 
functional languages, with better communication primitives [34] 
or, alternatively, to provide language-independent frameworks for 
integrating distributed applications, such as SCRIBBLE '. 

Our approach is, however, quite different. We aim to make com- 
munication implicit, or seamless. In some sense this is already 
widely used in programming practice, especially in the context of 
client-server applications, in the form of remote procedure calls 
(RPC) and related technologies such as Simple Object Access Pro- 
tocol (SOAP). What we aim to do is to integrate these approaches 
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into the programming language so that from a programmer perspec- 
tive there is no distinction between a remote and local call, even at 
higher order. Perhaps the closest to our aim is Remote Evaluation 
(REV) [31], another generalisation of RPC, which enables the use 
of higher-order functions across node boundaries. The main differ- 
ences between REV and our work is that REV relies on sending 
unevaluated code. The REV approach evolved into a variety of mo- 
bile code languages [7] which add several layers of sophistication 
to this approach, but have evolved in a direction that is not directly 
relevant to transparent distribution. 

The EDEN project [22], an implementation of parallel HASKELL 
for distributed systems which keeps most communication implicit, 
is also close to our aims. Another similarity to our work is that 
the specification of the language is tiered: an operational seman- 
tics at the level of the language and an abstract-machine semantics 
for execution environment, the Distributed Eden Abstract Machine 
(DREAM) [5]. EDEN is not perfectly seamless: a small set of syn- 
tactic constructs are used to manage processes explicitly and com- 
munication is always performed using head-strict lazy lists. There 
are significant technical differences between DREAM and Kriv- 
ine Nets since the DREAM is a mechanism of distribution for the 
Spineless Tagless G-machine [18] whereas we develop the Krivine 
machine. Also, in terms of emphasis, EDEN is an implementation- 
focussed project whereas we want to create a firm theoretical foun- 
dation on which compilation to distributed platforms can be carried 
out. Whereas (as far as we know) no soundness results exist for the 
DREAM, we provide a fully formalised proof. 

Other similar implementation-oriented projects are for tierless 
client-server computing such as LINKS [8], where "tierless" has a 
similar meaning to our use of "seamless". The execution mecha- 
nism that LINKS builds on, the client/server calculus [9], is spe- 
cialised to systems with two nodes, namely client and server. The 
two nodes are not equal peers: the server is designed to be state- 
less to be able to handle a large number of clients. The work on 
the client/server calculus also spawned work on a more general 
parallel abstract machine, LSAM, that handles an arbitrary num- 
ber of nodes [26]. A predecessor to LSAM, called DML, uses a 
similar abstract machine but for a richer language [28]. The main 
difference between these machines and Krivine Nets is that they 
are based on higher-level machines for call-by-value lambda cal- 
culi, that use explicit substitutions and are therefore less straight- 
forward to use as a basis for compilation. In contrast to our work, 
they also assume synchronous communication models. 

Abstract machines for distributed systems have also been stud- 
ied. In fact, as early as 1980 a formal proposal for standardising 
distributed computing using an abstract machine model was put 
forth, although it did not catch on [30]. The DREAM, DML and 
the LSAM are, as far as we are aware, the only abstract machines 
for general distributed systems which, like the DKrivine machine, 
combine conventional execution mechanisms with communication 
primitives. Abstract machines for communication only have been 
proposed [16], inspired by the CHAM (which we also take inspi- 
ration from, to model the communication network), but they only 
deal with half the problem when it comes to compiling conven- 
tional languages. 

Finally, we mention the compilation of conventional program- 
ming languages to (possibly) distributed architectures via process 
calculi, such as PICT [33], which also uses an abstract machine 
with communication primitives. We have studied techniques based 
on interaction semantics in prior work, using the Geometry of In- 
teraction [13] or Game Semantics [14]. Although such more ex- 
otic approaches can be effective at creating correct and transparent 
distribution, it seems to be the case that the single-node execution 
model is bound to be less efficient than that of conventional abstract 
machines. Without over-emphasising efficiency at this early stage, 
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Table 1. Benchmarks for distribution overheads 



it is also the case that interfacing code compiled using conventional 
techniques with code compiled using exotic techniques is difficult 
and leads to problems with interoperability via foreign-function in- 
terfaces. To us this is a significant short-coming which the current 
work seeks to avoid. 

7. Conclusion 

In this paper we have presented a method of distributing the execu- 
tion of the Krivine machine into what we call a Krivine Net. This 
gives us a principled compilation model of the applied CBN lambda 
calculus to an abstract distributed architecture. Our main results 
are a rigorous, fully formalised, proof of correctness of the Kriv- 
ine Net by comparing it to the conventional Krivine machine, and 
a proof-of-concept compiler which allows us to compare this com- 
pilation scheme with alternative methods based on other abstract 
machines and more exotic semantics such as Geometry of Interac- 
tion and Games. Compared to the more implementation-oriented 
prior work on transparent (or tierless or seamless) compilation to 
distributed or client-server architectures, our emphasis is on cor- 
rectness. We believe that our main contribution is a theoretical firm 
starting point for the principled study of compilation targetting such 
architectures. 

A broader question worth asking is whether this transparent and 
integrated approach to distributed computing is practical. There are 
two main possible objections: 

Performance Some might say that higher-level languages have 
poorer performance than system-oriented programming lan- 
guage, which makes them impractical. This debate started when 
Backus proposed FORTRAN as a machine-independent pro- 
gramming language, and has carried on fruitlessly ever since. 
We believe that the full spectrum of languages, from machine 
code to the most abstract, are worth investigating seriously. 
Seamless computing focusses on the latter, somewhat in the ex- 
treme, in the belief that the principled study of heterogeneous 
(not just distributed, but also reconfigurable etc.) compilation 
techniques will broaden and deepen our understanding of pro- 
gramming languages in general. And, if we are lucky and dili- 
gent, it may even yield a practical and very useful programming 
paradigm. 

Control Distributed computing raises certain specific obstacles in 
the way of using higher-level languages seamlessly, and this 
leads to more cogent arguments against their use. A distributed 
architecture is more volatile than a single node because indi- 
vidual nodes may fail, communication links may break and 
messages may get lost. Because of this, a remote call may fail 
in ways that a local call may not. Is it reasonable to present 
them to the programmer as if they are the same thing? We ar- 
gue that there is a significant class of applications where the 
answer is yes. If the programmer's objectives are algorithmic 
rather than the development of systems, it does not seem right to 
burden them with the often onerous task of failure management 
in a distributed system. Another argument against higher-level 
languages is that they may hide the details of the program's 
dataflow and not provide enough control to eliminate bottle- 



necks. To us it seems that the right way to manage both fail- 
ure and dataflow issues in distributed algorithmic programming 
requires a separation of concerns. Suitable runtime systems 
must present a more robust programming interface; MapRe- 
DUCE [10] and ClEL [25] are examples of execution engines 
with runtime systems that automatically handle configuration 
and failure management aspects, the latter supporting dynamic 
dataflow dependencies. If more fine-grained control is required, 
then separate deployment and configuration policies which are 
transparent to the programmer should be employed. In general, 
we believe that the role and the scope of orchestration lan- 
guages [6] should be greatly expanded to this end. 

7.1 Further work 

In this paper we largely ignored the finer issues of efficiency. Our 
aim was to support in-principle efficient single-node compilation, 
which happens when the DKrivine machine executes trivially on a 
single node as a Krivine machine, and to reduce the communication 
overhead by sending only small (bounded-size) messages which are 
necessary. For example, our use of views of remote stack extensions 
avoids the need to send pop messages. In the future we would 
like to examine the possibility of efficient compilation on a hunch 
that this could be a practical programming paradigm for distributed 
computing. In order to do this several immediate efficiency issues 
must and can be addressed. 

Remote pointers In the RPC literature it is sometimes argued that 
a shared or virtual address space, which is where our dis- 
tributed heap of continuation stacks lives, is prohibitively ex- 
pensive. However, research progress in tagged pointer repre- 
sentation [24] suggests that we can use pointer tags to distin- 
guish between local and remote pointers without even having 
to dereference them. With such tags we would pay a very low, 
if any, performance penalty for the local pointers. 

Garbage collection The execution of the Krivine Net creates 
garbage in the machines. Distributed garbage collection can be 
a serious problem [29], but we have strong reasons to believe 
that it can be avoided here, because the heap structures that 
get created are quite simple. Most importantly, there are never 
circular linked structures, otherwise the relations would not be 
well founded. This means that a simpler method, reference- 
counting, can be used [4]. We also know that efficient mem- 
ory management is possible when compiling CBN functional 
programming languages to distributed architectures. The GOI 
compiler is purely stack-based, while the GAMC compiler uses 
heaps but does explicit deallocations of locations that are no 
longer needed. 

Shortcut forwarding One of the most unpleasant features of the 
current Krivine Net approach is the excessive forwarding of 
data, especially on remote RETURN. A way to alleviate this 
issue might be to not create indirections when a node has a 
stack consisting only of a stack extension at the time of a remote 
invocation, meaning that the remote node could return directly 
to the current node's invoker. However, the implementation is 
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complex enough to raise non-trivial issues of correctness and 
therefore this falls outside the scope of this paper. 

We also plan to improve the programming language by adding 
more expressiveness, such as parallelism, assignable state and al- 
gebraic datatypes. Some of these features have been already im- 
plemented in a compiler based on game semantics, but we hope 
they will be more efficient in the current setting. We also aim to 
enrich the type system to pay more attention to possibly unrealistic 
patterns of distributions. Type systems such as ML5 [35] can en- 
sure that the interaction between local and remote resource is safe 
- in our approach all such interactions are safe, but some can be 
extremely inefficient and a type system can issue at least warnings 
against unreasonable deployments. Finally, another language-level 
development that could be useful is the principled development of 
configuration, deployment and, more generally, choreography lan- 
guages. 

Our dream is the eventual development of an end-to-end seam- 
less distributed compiler for a higher-order imperative and paral- 
lel functional programming language, along the lines of the COM- 
PCERT project [20]. The formalisation of the correctness of the 
Krivine Net, relative to the conventional Krivine machine, is the 
first, but technically the most demanding step. 
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